The Imagination of Crowds: Conversational AAC Language Modeling using Crowdsourcing and Large Data Sources
نویسندگان
چکیده
Augmented and alternative communication (AAC) devices enable users with certain communication disabilities to participate in everyday conversations. Such devices often rely on statistical language models to improve text entry by offering word predictions. These predictions can be improved if the language model is trained on data that closely reflects the style of the users’ intended communications. Unfortunately, there is no large dataset consisting of genuine AAC messages. In this paper we demonstrate how we can crowdsource the creation of a large set of fictional AAC messages. We show that these messages model conversational AAC better than the currently used datasets based on telephone conversations or newswire text. We leverage our crowdsourced messages to intelligently select sentences from much larger sets of Twitter, blog and Usenet data. Compared to a model trained only on telephone transcripts, our best performing model reduced perplexity on three test sets of AAC-like communications by 60– 82% relative. This translated to a potential keystroke savings in a predictive keyboard interface of 5–11%.
منابع مشابه
A crowdsourcing method to develop virtual human conversational agents
Educators in medicine, psychology, and the military want to provide their students with interpersonal skills practice. Virtual humans offer structured learning of interview skills, can facilitate learning about unusual conditions, and are always available. However, the creation of virtual humans with the ability to understand and respond to natural language requires costly engineering by conver...
متن کاملTag Questions in Persian: Investigating the Conversational Functions
This article intends to identify the use and typify the functions of tag questions (TQs) in Persian everyday conversations and dialogic interaction. The analyses were made based on two data sources: A documentary film titled Commander in which the participants are engaged in free interactions, and an audio-recorded instrument named CALLFRIEND which consists of Iranian native...
متن کاملThe Relationship between Self-esteem and Conversational Dominance of Iranian EFL Learners’ Speaking
The crucial role of affective factors like anxiety, inhibition, motivation and self-esteem have long been of interest in the field of language learning due to their enormous association with the cognitive processes involved in performance in a second or foreign language. This study aimed at investigating the relationship between Iranian EFL learners’ self-esteem and conversational dominance in ...
متن کاملPerform Three Data Mining Tasks with Crowdsourcing Process
For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...
متن کاملA Conversational Movie Search System Based on Conditional Random Fields
Online streaming companies such as Netflix have become dominant in the media distribution sector. However, such media delivery services often support very rudimentary search, especially for natural language queries. To provide a more natural search interface, we have developed a conversational movie search system, which parses the recognition hypothesis of a spoken query into semantic classes u...
متن کامل